Optimal two-stage genome-wide association designs based on false discovery rate
نویسندگان
چکیده
Genome-wide association studies are likely to be conducted in large scale in the near future. In such studies, searching over hundreds of thousands of markers for the few ones that are associated with disease brings out the multiple-hypothesis testing problem in its severe form. We explore, in a two-stage design, how the use of false discovery rate (FDR) can alleviate the burden of a prohibitively strict significance level for single marker tests and still control the number of false positive findings, when there is more than one causal variant. FDR is the expected proportion of false positives among all significant findings. It can be approximated by (1-p0) /[(1-p0) + p0(1)], where p0 is the proportion of true causal markers, is the type I error rate and 1the power of a two-stage study. When 500,000 SNPs are genotyped in the first stage with fixed SNP array and the most significant SNPs are genotyped in the second stage with standard but 20 times more expensive high-throughput techniques, up to 20% savings in the minimum genotyping cost is achieved for p0 in the range of 10 −5 to 5 × 10−4 and FDR in the range of 0.05 to 0.7, compared to when Bonferroni-corrected significance level is used. In terms of sample size, the saving is up to 60%. However, these savings come at a cost of more false positive findings. © 2006 Elsevier B.V. All rights reserved.
منابع مشابه
Identifying significant gene‐environment interactions using a combination of screening testing and hierarchical false discovery rate control
Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G...
متن کاملOptimal False Discovery Rate Control for Dependent Data.
This paper considers the problem of optimal false discovery rate control when the test statistics are dependent. An optimal joint oracle procedure, which minimizes the false non-discovery rate subject to a constraint on the false discovery rate is developed. A data-driven marginal plug-in procedure is then proposed to approximate the optimal joint procedure for multivariate normal data. It is s...
متن کاملOptimal designs for two-stage genome-wide association studies.
Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in s...
متن کاملTwo-stage designs for experiments with a large number of hypotheses
MOTIVATION When a large number of hypotheses are investigated the false discovery rate (FDR) is commonly applied in gene expression analysis or gene association studies. Conventional single-stage designs may lack power due to low sample sizes for the individual hypotheses. We propose two-stage designs where the first stage is used to screen the 'promising' hypotheses which are further investiga...
متن کاملPrograms for calculating the statistical powers of detecting susceptibility genes in case–control studies based on multistage designs
MOTIVATION A two-stage association study is the most commonly used method among multistage designs to efficiently identify disease susceptibility genes. Recently, some SNP studies have utilized more than two stages to detect disease genes. However, there are few available programs for calculating statistical powers and positive predictive values (PPVs) of arbitrary n-stage designs. RESULTS We...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 51 شماره
صفحات -
تاریخ انتشار 2006